GH-3475: Fix parquet-vector compatiblity with Java > 17 by iemejia · Pull Request #3476 · apache/parquet-java

iemejia · 2026-04-13T14:31:49Z

Replace the ByteBuffer-specific vector loads with local helpers that copy the required bytes and then call ByteVector.fromArray. This removes the dependency on JDK-specific ByteVector.fromByteBuffer entry points, which can fail with NoSuchMethodError on newer runtimes.

Assisted-by: OpenCode:gpt-5.4

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Replace the ByteBuffer-specific vector loads with local helpers that copy the required bytes and then call ByteVector.fromArray. This removes the dependency on JDK-specific ByteVector.fromByteBuffer entry points, which can fail with NoSuchMethodError on newer runtimes. Assisted-by: OpenCode:gpt-5.4

Copilot

Pull request overview

Updates the vector-encoding plugin to avoid JDK-specific ByteVector.fromByteBuffer(...) entry points (which can break on newer Java runtimes), and expands CI coverage to build/test the vector plugins on more JDK versions.

Changes:

Replace ByteVector.fromByteBuffer(...) loads with local fromByteBuffer(...) helpers that copy bytes into an array and call ByteVector.fromArray(...).
Add helper methods to read bytes from ByteBuffer for vector loads (masked and unmasked).
Expand the vector-plugins GitHub Actions matrix to run on JDK 17/21/25 and conditionally skip Spotless on non-17 JDKs; ignore Eclipse .factorypath.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.

File	Description
parquet-plugins/parquet-encoding-vector/src/main/java/org/apache/parquet/column/values/bitpacking/ByteBitPacking512VectorLE.java	Reworks ByteBuffer-based vector loads to go through array-backed helpers instead of `fromByteBuffer`.
.github/workflows/vector-plugins.yml	Expands CI to additional Java versions and adjusts Spotless skipping behavior.
.gitignore	Ignores `.factorypath`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-18T16:01:15Z

+  private static ByteVector fromByteBuffer(
+      VectorSpecies<Byte> species, ByteBuffer input, int inPos, VectorMask<Byte> mask) {
+    return ByteVector.fromArray(species, readInputBytes(input, inPos, mask.trueCount()), 0, mask);
+  }


In the masked overload, readInputBytes(..., mask.trueCount()) creates a backing array smaller than species.length(). ByteVector.fromArray(species, array, 0, mask) may still perform bounds checks assuming array.length >= species.length() (and also breaks immediately if a mask ever has a true lane at an index >= array.length). To keep masked loads safe across JDK implementations, allocate an array of species.length() and only fill the needed prefix bytes (e.g., mask.trueCount()), leaving the rest as zero.

Copilot · 2026-04-18T16:01:15Z

+  private static ByteVector fromByteBuffer(VectorSpecies<Byte> species, ByteBuffer input, int inPos) {
+    return ByteVector.fromArray(species, readInputBytes(input, inPos, species.length()), 0);
+  }
+
+  private static ByteVector fromByteBuffer(
+      VectorSpecies<Byte> species, ByteBuffer input, int inPos, VectorMask<Byte> mask) {
+    return ByteVector.fromArray(species, readInputBytes(input, inPos, mask.trueCount()), 0, mask);
+  }
+
+  private static byte[] readInputBytes(ByteBuffer input, int inPos, int byteCount) {
+    byte[] bytes = new byte[byteCount];
+    ByteBuffer source = input.duplicate();
+    source.position(inPos);
+    source.get(bytes);


readInputBytes allocates a new byte[] and creates a new ByteBuffer via duplicate() on every vector unpack from a ByteBuffer. This is a significant per-call allocation/copy regression compared to the previous ByteVector.fromByteBuffer(...) path and can materially impact the performance benefits of the vector plugin. Consider reducing allocations by reusing a scratch buffer (e.g., via ThreadLocal<byte[]> sized to the max species length) and/or fast-pathing heap buffers (input.hasArray()) to load directly from the backing array when possible.

Suggested change

private static ByteVector fromByteBuffer(VectorSpecies<Byte> species, ByteBuffer input, int inPos) {

return ByteVector.fromArray(species, readInputBytes(input, inPos, species.length()), 0);

}

private static ByteVector fromByteBuffer(

VectorSpecies<Byte> species, ByteBuffer input, int inPos, VectorMask<Byte> mask) {

return ByteVector.fromArray(species, readInputBytes(input, inPos, mask.trueCount()), 0, mask);

}

private static byte[] readInputBytes(ByteBuffer input, int inPos, int byteCount) {

byte[] bytes = new byte[byteCount];

ByteBuffer source = input.duplicate();

source.position(inPos);

source.get(bytes);

private static final ThreadLocal<byte[]> INPUT_SCRATCH =

ThreadLocal.withInitial(() -> new byte[B512.length()]);

private static ByteVector fromByteBuffer(VectorSpecies<Byte> species, ByteBuffer input, int inPos) {

if (input.hasArray()) {

return ByteVector.fromArray(species, input.array(), input.arrayOffset() + inPos);

}

return ByteVector.fromArray(species, readInputBytes(input, inPos, species.length()), 0);

}

private static ByteVector fromByteBuffer(

VectorSpecies<Byte> species, ByteBuffer input, int inPos, VectorMask<Byte> mask) {

if (input.hasArray()) {

return ByteVector.fromArray(species, input.array(), input.arrayOffset() + inPos, mask);

}

return ByteVector.fromArray(species, readInputBytes(input, inPos, mask.trueCount()), 0, mask);

}

private static byte[] readInputBytes(ByteBuffer input, int inPos, int byteCount) {

byte[] bytes = INPUT_SCRATCH.get();

for (int i = 0; i < byteCount; i++) {

bytes[i] = input.get(inPos + i);

}

iemejia force-pushed the vector-compat branch 2 times, most recently from 4527830 to 313db67 Compare April 17, 2026 14:07

iemejia force-pushed the vector-compat branch from 313db67 to 23cbc88 Compare April 17, 2026 14:20

wgtmac requested a review from Copilot April 18, 2026 15:57

Copilot started reviewing on behalf of wgtmac April 18, 2026 15:57 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-3475: Fix parquet-vector compatiblity with Java > 17#3476

GH-3475: Fix parquet-vector compatiblity with Java > 17#3476
iemejia wants to merge 1 commit intoapache:masterfrom
iemejia:vector-compat

iemejia commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iemejia commented Apr 13, 2026

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants